Consistency of Spectral Clustering
نویسندگان
چکیده
Consistency is a key property of all statistical procedures analyzing randomly sampled data. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of the popular family of spectral clustering algorithms, which clusters the data with the help of eigenvectors of graph Laplacian matrices. We develop new methods to establish that, for increasing sample size, those eigenvectors converge to the eigenvectors of certain limit operators. As a result, we can prove that one of the two major classes of spectral clustering (normalized clustering) converges under very general conditions, while the other (unnormalized clustering) is only consistent under strong additional assumptions, which are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering.
منابع مشابه
Analysis of spectral clustering algorithms for community detection: the general bipartite setting
We consider the analysis of spectral clustering algorithms for community detection under a stochastic block model (SBM). A general spectral clustering algorithm consists of three steps: (1) regularization of an appropriate adjacency or Laplacian matrix (2) a form of spectral truncation and (3) a k-means type algorithm in the reduced spectral domain. By varying each step, one can obtain differen...
متن کاملOn defining affinity graph for spectral clustering through ranking on manifolds
Spectral clustering consists of two distinct stages: (a) construct an affinity graph from the dataset and (b) cluster the data points through finding an optimal partition of the affinity graph. The focus of the paper is the first step. Existing spectral clustering algorithms adopt Gaussian function to define the affinity graph since it is easy to implement. However, Gaussian function is hard to...
متن کاملModel-free Consistency of Graph Partitioning
In this paper, we exploit the theory of dense graph limits to provide a new framework to study the stability of graph partitioning methods, which we call structural consistency. Both stability under perturbation as well as asymptotic consistency (i.e., convergence with probability 1 as the sample size goes to infinity under a fixed probability model) follow from our notion of structural consist...
متن کاملA variational approach to the consistency of spectral clustering
This paper establishes the consistency of spectral approaches to data clustering. We consider clustering of point clouds obtained as samples of a ground-truth measure. A graph representing the point cloud is obtained by assigning weights to edges based on the distance between the points they connect. We investigate the spectral convergence of both unnormalized and normalized graph Laplacians to...
متن کاملComparison of Combination Methods using Spectral Clustering Ensembles
We address the problem of the combination of multiple data partitions, that we call a clustering ensemble. We use a recent clustering approach, known as Spectral Clustering, and the classical K-Means algorithm to produce the partitions that constitute the clustering ensembles. A comparative evaluation of several combination methods is performed by measuring the consistency between the combined ...
متن کامل